popcompr is an R package to make it easier to compare different high resolution population datasets for humanitarian and research purposes. It is under active development and has not been released, but the code is available through a GPL3 license.
Included in the package are files to reproduce a minimal working example for the country of Lesotho. lesotho_wp_2019 and lesotho_fb_2019 are included in the package (use ?lesotho_wp_2019 for more details).
library(popcompr) #> Warning: replacing previous import 'data.table::shift' by 'raster::shift' when #> loading 'popcompr' library(raster) #> Loading required package: sp library(data.table) #> #> Attaching package: 'data.table' #> The following object is masked from 'package:raster': #> #> shift library(foreach) library(plotly) #> Loading required package: ggplot2 #> #> Attaching package: 'plotly' #> The following object is masked from 'package:ggplot2': #> #> last_plot #> The following object is masked from 'package:raster': #> #> select #> The following object is masked from 'package:stats': #> #> filter #> The following object is masked from 'package:graphics': #> #> layout library(ggplot2) library(fasterize) #> #> Attaching package: 'fasterize' #> The following object is masked from 'package:graphics': #> #> plot #> The following object is masked from 'package:base': #> #> plot library(sf) #> Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
You can first get an estimate of the time required to return the comparison:
# comparing at pixel level with data included in the package lesotho_fb_2019 <- raster(system.file("external/lso_facebook_2019.tif", package="popcompr")) lesotho_wp_2019 <- raster(system.file("external/lso_worldpop_2019.tif", package="popcompr")) pop_list <- list(lesotho_wp_2019, lesotho_fb_2019) compare_pop(pop_list, parallel = FALSE, estimate_time = TRUE) #> It will take approximately 18.41 seconds to complete the full job serially.
It shouldn’t take too long by that estimate, but here we’ll work through parallelizing. Here, I use the doParallel backend, but other do packages can be used (anything compatible with the %dopar% infix in the foreach package).
# parallelized example library(doParallel) #> Loading required package: iterators #> Loading required package: parallel cl <- makeCluster(detectCores() - 1) # how many cores do we have available registerDoParallel(cl) system.time( # defaults to estimate_time = FALSE & resolution of ~ 1km at equator exe <- compare_pop(pop_list, parallel = TRUE) ) #> [1] "warning: 6.08053179085255 people were not matched." #> user system elapsed #> 0.536 0.026 13.522 stopCluster(cl)
compare_pop will warn you if any people were not resampled to the comparison grid (this happens sometimes when pops are at the edge of the original raster).
The output is a raster brick with a layer corresponding to each of the input population rasters. We can vizualize and compare the rasters:
comp_dt <- data.table(as.data.frame(exe, xy = TRUE)) # we use data.table as it's easier to # transform functions for vizualizing difference transform <- function(x) { logged <- log(abs(x) + 1e-6) * sign(x) return(logged) } inv_trans <- function(x) { inv <- (exp(abs(x)) - 1e-6) * sign(x) return(round(inv, 2)) } map_compare <- ggplot(comp_dt) + geom_raster(aes(x = x, y = y, fill = transform(lso_worldpop_2019 - lso_facebook_2019))) + scale_fill_gradient2(labels = inv_trans, name = "Difference between \n population estimates") + coord_quickmap() map_compare

hex_compare <- ggplot(comp_dt) + geom_hex(aes(x = lso_worldpop_2019, y = lso_facebook_2019), color = "grey") + geom_abline(intercept = 0, slope = 1, linetype = 2, color = "grey") + scale_fill_distiller(direction = 1, trans = "log", labels = function(x) round(x, -1)) hex_compare #> Warning: Removed 56799 rows containing non-finite values (stat_binhex).

You can also make these interactive using plotly/ggplotly functions:
ggplotly(map_compare)
ggplotly(hex_compare) #> Warning: Removed 56799 rows containing non-finite values (stat_binhex).
Check the distribution of differences:
ggplot(comp_dt) + geom_histogram(aes(x =lso_worldpop_2019 - lso_facebook_2019)) #> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. #> Warning: Removed 56799 rows containing non-finite values (stat_bin).

You can access country shapefiles using get_country_shape which will return a shapefile as an sf object. These are available through the geoBoundaries API, where more documentation can be found on available datasets. In addition, geoboundaries is a dataset provided in the package to find country codes and available admin levels.
# Find the right iso code & see which admin levels are available dplyr::filter(geoboundaries, grepl("Les", country)) #> iso_code year admin_level source #> 1 LSO 2017.0 ADM0 OpenStreetMap #> 2 LSO 2017.0 ADM1 OpenStreetMap #> 3 LSO 2020.0 ADM2 Humanitarian Data Exchange #> 4 TLS 2017.0 ADM0 OpenStreetMap #> 5 TLS 2017.0 ADM1 OpenStreetMap #> 6 TLS 2018.0 ADM2 HDX #> license country #> 1 Open Data Commons Open Database License 1.0 Lesotho #> 2 Open Data Commons Open Database License 1.0 Lesotho #> 3 Lesotho #> 4 Open Data Commons Open Database License 1.0 Timor-Leste #> 5 Open Data Commons Open Database License 1.0 Timor-Leste #> 6 Humanitarian use only - Noted in metadata tab Timor-Leste
We get the Lesotho shapefile to the admin level 2 (the type argument defaults to a simplified shapefile which is faster for plotting and downloading, but users may prefer unsimplified files for better accuracy see the geoBoundaries API documentation for more information).
les_shape <- get_country_shape(country_iso = "LSO", admin_level = 2) #> Reading layer `geoBoundariesSSCU-3_0_0-LSO-ADM2' from data source `https://geoboundaries.org/data/geoBoundariesSSCU-3_0_0/LSO/ADM2/geoBoundariesSSCU-3_0_0-LSO-ADM2.geojson' using driver `GeoJSON' #> Simple feature collection with 78 features and 5 fields #> geometry type: POLYGON #> dimension: XY #> bbox: xmin: 27.01125 ymin: -30.67785 xmax: 29.45737 ymax: -28.57103 #> geographic CRS: WGS 84
Then we can aggregate the population rasters to it:
# Get the shapefile at admin 2 les_shape <- aggregate_to_shp(brick = exe, sf = les_shape, max_adjacent = 100) #> Warning in gmin(match, na.rm = TRUE): No non-missing values found in at least #> one group. Returning 'Inf' for such groups to be consistent with base #> Warning in gmin(match, na.rm = TRUE): No non-missing values found in at least #> one group. Returning 'Inf' for such groups to be consistent with base #> Warning in gmin(match, na.rm = TRUE): No non-missing values found in at least #> one group. Returning 'Inf' for such groups to be consistent with base #> Warning in gmin(match, na.rm = TRUE): No non-missing values found in at least #> one group. Returning 'Inf' for such groups to be consistent with base #> Warning in gmin(match, na.rm = TRUE): No non-missing values found in at least #> one group. Returning 'Inf' for such groups to be consistent with base #> Warning in gmin(match, na.rm = TRUE): No non-missing values found in at least #> one group. Returning 'Inf' for such groups to be consistent with base
aggregate_to_shp will warn you if grid cell values go unallocated to the shapefile. In the case that many people go missing, you can set max_adjacent higher (this determines how many grid cells to buffer to when looking for the nearest non-NA neighbor), but it may also be wise to check the extent & boundaries of the shapefile vs. the population rasters.
We can vizualize these differences as well:
map_compare <- ggplot(les_shape) + geom_sf(aes(fill = transform(lso_worldpop_2019 - lso_facebook_2019))) + scale_fill_gradient2(labels = inv_trans, name = "Difference between \n population estimates") map_compare

hex_compare <- ggplot(les_shape) + geom_hex(aes(x = lso_worldpop_2019, y = lso_facebook_2019), color = "grey") + geom_abline(intercept = 0, slope = 1, linetype = 2, color = "grey") + scale_fill_distiller(direction = 1, trans = "log", labels = function(x) round(x, -1)) hex_compare

You can also make these interactive using plotly/ggplotly functions:
ggplotly(map_compare)
ggplotly(hex_compare)
Check the distribution of differences:
ggplot(les_shape) + geom_histogram(aes(x =lso_worldpop_2019 - lso_facebook_2019)) #> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
